Search CORE

13 research outputs found

Failure Tolerant Training with Persistent Memory Disaggregation over CXL

Author: Choi Hanjin
Jang Junhyeok
Jung Myoungsoo
Kwon Miryeong
Lee Sangwon
Publication venue
Publication date: 19/01/2023
Field of study

This paper proposes TRAININGCXL that can efficiently process large-scale recommendation datasets in the pool of disaggregated memory while making training fault tolerant with low overhead. To this end, i) we integrate persistent memory (PMEM) and GPU into a cache-coherent domain as Type-2. Enabling CXL allows PMEM to be directly placed in GPU's memory hierarchy, such that GPU can access PMEM without software intervention. TRAININGCXL introduces computing and checkpointing logic near the CXL controller, thereby training data and managing persistency in an active manner. Considering PMEM's vulnerability, ii) we utilize the unique characteristics of recommendation models and take the checkpointing overhead off the critical path of their training. Lastly, iii) TRAININGCXL employs an advanced checkpointing technique that relaxes the updating sequence of model parameters and embeddings across training batches. The evaluation shows that TRAININGCXL achieves 5.2x training performance improvement and 76% energy savings, compared to the modern PMEM-based recommendation systems

arXiv.org e-Print Archive

GraphTensor: Comprehensive GNN-Acceleration Framework for Efficient Parallel Processing of Massive Datasets

Author: Bae Hanyeoreum
Gouk Donghyun
Jang Junhyeok
Jung Myoungsoo
Kwon Miryeong
Publication venue
Publication date: 27/05/2023
Field of study

We present GraphTensor, a comprehensive open-source framework that supports efficient parallel neural network processing on large graphs. GraphTensor offers a set of easy-to-use programming primitives that appreciate both graph and neural network execution behaviors from the beginning (graph sampling) to the end (dense data processing). Our framework runs diverse graph neural network (GNN) models in a destination-centric, feature-wise manner, which can significantly shorten training execution times in a GPU. In addition, GraphTensor rearranges multiple GNN kernels based on their system hyperparameters in a self-governing manner, thereby reducing the processing dimensionality and the latencies further. From the end-to-end execution viewpoint, GraphTensor significantly shortens the service-level GNN latency by applying pipeline parallelism for efficient graph dataset preprocessing. Our evaluation shows that GraphTensor exhibits 1.4x better training performance than emerging GNN frameworks under the execution of large-scale, real-world graph workloads. For the end-to-end services, GraphTensor reduces training latencies of an advanced version of the GNN frameworks (optimized for multi-threaded graph sampling) by 2.4x, on average

arXiv.org e-Print Archive

SimpleSSD: Modeling Solid State Drives for Holistic System Simulation

Author: Abulila Ahmed
Jung Myoungsoo
Kandemir Mahmut
Kim Nam Sung
Kwon Miryeong
Shahidi Narges
Shalf John
Zhang Jie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/09/2017
Field of study

Existing solid state drive (SSD) simulators unfortunately lack hardware and/or software architecture models. Consequently, they are far from capturing the critical features of contemporary SSD devices. More importantly, while the performance of modern systems that adopt SSDs can vary based on their numerous internal design parameters and storage-level configurations, a full system simulation with traditional SSD models often requires unreasonably long runtimes and excessive computational resources. In this work, we propose SimpleSSD, a highfidelity simulator that models all detailed characteristics of hardware and software, while simplifying the nondescript features of storage internals. In contrast to existing SSD simulators, SimpleSSD can easily be integrated into publicly-available full system simulators. In addition, it can accommodate a complete storage stack and evaluate the performance of SSDs along with diverse memory technologies and microarchitectures. Thus, it facilitates simulations that explore the full design space at different levels of system abstraction.Comment: This paper has been accepted at IEEE Computer Architecture Letters (CAL

arXiv.org e-Print Archive

eScholarship - University of California

FastDrain: Removing Page Victimization Overheads in NVMe Storage Stack

Author: Jie Zhang
Mahmut Kandemir
Miryeong Kwon
Myoungsoo Jung
Nam Sung Kim
Sanghyun Han
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref